Building Bilingual Corpus based on Hybrid Approach for Myanmar-English Machine Translation
نویسنده
چکیده
Word alignment in bilingual corpora has been an active research topic in the Machine Translation research groups. In this paper, we describe an alignment system that aligns English-Myanmar texts at word level in parallel sentences. Essential for building parallel corpora is the alignment of translated segments with source segments. Since word alignment research on Myanmar and English languages is still in its infancy, it is not a trivial task for Myanmar-English text. A parallel corpus is a collection of texts in two languages, one of which is the translation equivalent of the other.Thus, the main purpose of this system is to construct word-aligned parallel corpus to be able in Myanmar-English machine translation. The proposed approach is combination of corpus based approach and dictionary lookup approach. The corpus based approach is based on the first three IBM models and Expectation Maximization (EM) algorithm. For the dictionary lookup approach, the proposed system uses the bilingual Myanmar-English Dictionary.
منابع مشابه
Developing Word-aligned Myanmar-English Parallel Corpus based on the IBM Models
Word alignment in bilingual corpora has been an active research topic in the Machine Translation research groups. Corpus is the body of text collections, which are useful for Language Processing (NLP). Parallel text alignment is the identification of the corresponding sentences in the parallel text. Large collections of parallel level are prerequisite for many areas of linguistic research. Para...
متن کاملWord Sense Disambiguation Based Myanmar-to-english Machine Translation System
Today, word sense disambiguation (WSD) is an important technique for many natural language processing (NLP) applications such as grammatical analysis, content analysis, information retrieval and machine translation. Among them, the WSD technique is used for machine translation to find the correct sense of a word in a specific context. In machine translation, the input sentences in the source la...
متن کاملBuilding Multiword Expressions Bilingual Lexicons for Domain Adaptation of an Example-Based Machine Translation System
We describe in this paper a hybrid approach to build automatically bilingual lexicons of Multiword Expressions (MWEs) from parallel corpora. We more specifically investigate the impact of using a domain-specific bilingual lexicon of MWEs on domain adaptation of an Example-Based Machine Translation (EBMT) system. We conducted experiments on the English-French language pair and two kinds of texts...
متن کاملMyanmar Phrases Translation Model with Morphological Analysis for Statistical Myanmar to English Translation System
This paper presents Myanmar phrases translation model with morphological analysis. The system is based on statistical approach. In statistical machine translation, large amount of information is needed to guide the translation process. When small amount of training data is available, morphological analysis is needed especially for morphology rich language. Myanmar language is inflected language...
متن کاملDevelopment of Myanmar-English Bilingual WordNet like Lexicon
A bilingual concept lexicon is of significance for Information Extraction (IE), Machine Translation (MT), Word Sense Disambiguation (WSD) and the like. Myanmar-English Bilingual WordNet like Lexicon (MEBWL) is developed to fulfill the requirements of Language Acquisition (LA). However, it is reasonably difficult to build such a lexicon is quite challenging in time and cost consuming. To overcom...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011